-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky maxSubmitRetries test #227
Conversation
This is a slightly speculative fix for a test that fails intermittently on `sharedb-mongo`. I believe these intermittent failures are due to a race condition in a concurrency test. The test works by attempting to fire two commits off at the same time, and hoping that one of them is committed just before the other, so that a `SubmitRequest.retry` is triggered whilst the `maxSubmitRetries` is set to `0`, resulting in an error that is expected. However, I believe it's possible for these commits to (in some cases) happen sequentially rather than concurrently, and fail to error. This change attempts to force them into this retry condition by: - Catching both ops in the `commit` middleware, _just_ before they're about to be committed (and hit a `retry` if applicable) - Waiting until both ops have reached this state - Triggering the first op's `commit` - Then in the callback of that op, triggering the second op's `commit` - The second op should now find that the first op has beaten it to committing, and trigger a `retry`
See [this issue][1]. [1]: #226 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good sleuthing!
I believe that the second client's commit will hit the retry case as long as both clients get the same snapshot back from backend.db.getSnapshot
in SubmitRequest.submit
:
https://github.com/share/sharedb/blob/v1.0.0-beta.8/lib/submit-request.js#L45
Delaying the sending of commits is a pretty neat way of doing it, and I like how clean it ended up being too. 👍
test/client/submit.js
Outdated
}); | ||
doc.submitOp({p: ['age'], na: 2}, function (error) { | ||
if (error) return done(error); | ||
doc2Callback(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move this up to just below docCallback()
, as by that point both clients will have already fetched the snapshot in submit
and gone through apply
.
Having the doc2Callback()
call up there makes it a bit more obvious that the middleware is deferring the first callback until the second one is ready, and calling both in sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. But isn't the commit
hook just before they've been committed to the db? ie there's still a potential race condition if one of them gets committed before the other. That is, we don't know which callback will finish first, and get forced into retry
...? This way we guarantee that doc
will always be committed before doc2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(That could be mitigated by putting back the old shared callback with a count
, but I find that sort of thing a bit sketchy; it's like admitting that you're not really in control of your own test.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, we don't know which callback will finish first, and get forced into retry...? This way we guarantee that doc will always be committed before doc2.
(That could be mitigated by putting back the old shared callback with a count, but I find that sort of thing a bit sketchy; it's like admitting that you're not really in control of your own test.)
Got it, you want to sequence the actual commits so you know 1 goes before 2. Makes sense, thanks for explaining!
test/client/submit.js
Outdated
done(); | ||
var docCallback; | ||
var doc2Callback; | ||
backend.use('commit', function (request, callback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I and future contributors (like future you!) would appreciate it if you added some comments here about why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Forgot to drop this note for the future - the failing test and its output:
The failure can be reproduced artificially by wrapping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the explanations to the test!
I can merge this, but I can't publish it to NPM since I'm not a maintainer on the sharedb
NPM package.
@nateps - Can you please handle merging, versioning, and publishing this? This will unblock some other PRs on e.g. sharedb-mongo that include and run this test.
.travis.yml
Outdated
@@ -5,6 +5,6 @@ node_js: | |||
- "8" | |||
- "6" | |||
- "4" | |||
script: "npm run jshint && npm run test-cover" | |||
script: "ln -s .. node_modules/sharedb; npm run jshint && npm run test-cover" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping another note for the future, this is to work around the circular dependency issue between sharedb and sharedb-mingo-memory:
#226 (comment)
I'm OK with this workaround for now to unblock the tests as the workaround is isolated to Travis, until we figure out a better solution in the linked issue above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it's better to get the #226 merged first, as it also updates the Travis config's Node versions to reflect latest + LTE versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Happy to wait and rebase. Would just be nice to get this through to unblock other PRs.
Thanks for the contribution! Fixing flakey tests always brings me great joy. 💥 |
This is a slightly speculative fix for a test that fails intermittently
on
sharedb-mongo
. I believe these intermittent failures are due to arace condition in a concurrency test.
The test works by attempting to fire two commits off at the same time,
and hoping that one of them is committed just before the other, so that
a
SubmitRequest.retry
is triggered whilst themaxSubmitRetries
isset to
0
, resulting in an error that is expected.However, I believe it's possible for these commits to (in some cases)
happen sequentially rather than concurrently, and fail to error.
This change attempts to force them into this retry condition by:
commit
middleware, just before they'reabout to be committed (and hit a
retry
if applicable)commit
commit
committing, and trigger a
retry